7 research outputs found

    Accurate reconstruction of viral quasispecies spectra through improved estimation of strain richness

    Get PDF
    Background Estimating the number of different species (richness) in a mixed microbial population has been a main focus in metagenomic research. Existing methods of species richness estimation ride on the assumption that the reads in each assembled contig correspond to only one of the microbial genomes in the population. This assumption and the underlying probabilistic formulations of existing methods are not useful for quasispecies populations where the strains are highly genetically related. The lack of knowledge on the number of different strains in a quasispecies population is observed to hinder the precision of existing Viral Quasispecies Spectrum Reconstruction (QSR) methods due to the uncontrolled reconstruction of a large number of in silico false positives. In this work, we formulated a novel probabilistic method for strain richness estimation specifically targeting viral quasispecies. By using this approach we improved our recently proposed spectrum reconstruction pipeline ViQuaS to achieve higher levels of precision in reconstructed quasispecies spectra without compromising the recall rates. We also discuss how one other existing popular QSR method named ShoRAH can be improved using this new approach. Results On benchmark data sets, our estimation method provided accurate richness estimates (< 0.2 median estimation error) and improved the precision of ViQuaS by 2%-13% and F-score by 1%-9% without compromising the recall rates. We also demonstrate that our estimation method can be used to improve the precision and F-score of ShoRAH by 0%-7% and 0%-5% respectively. Conclusions The proposed probabilistic estimation method can be used to estimate the richness of viral populations with a quasispecies behavior and to improve the accuracy of the quasispecies spectra reconstructed by the existing methods ViQuaS and ShoRAH in the presence of a moderate level of technical sequencing errors

    Assessing Species Diversity Using Metavirome Data: Methods and Challenges

    Get PDF
    Assessing biodiversity is an important step in the study of microbial ecology associated with a given environment. Multiple indices have been used to quantify species diversity, which is a key biodiversity measure. Measuring species diversity of viruses in different environments remains a challenge relative to measuring the diversity of other microbial communities. Metagenomics has played an important role in elucidating viral diversity by conducting metavirome studies; however, metavirome data are of high complexity requiring robust data preprocessing and analysis methods. In this review, existing bioinformatics methods for measuring species diversity using metavirome data are categorised broadly as either sequence similarity-dependent methods or sequence similarity-independent methods. The former includes a comparison of DNA fragments or assemblies generated in the experiment against reference databases for quantifying species diversity, whereas estimates from the latter are independent of the knowledge of existing sequence data. Current methods and tools are discussed in detail, including their applications and limitations. Drawbacks of the state-of-the-art method are demonstrated through results from a simulation. In addition, alternative approaches are proposed to overcome the challenges in estimating species diversity measures using metavirome data.DH is fully supported by the PhD scholarships of The University of Melbourne. This work is also supported by Australian Research Council grant LP140100670 and the industry partner YourGeneBioScience

    ENVirT: inference of ecological characteristics of viruses from metagenomic data

    Get PDF
    Background Estimating the parameters that describe the ecology of viruses,particularly those that are novel, can be made possible using metagenomic approaches. However, the best-performing existing methods require databases to first estimate an average genome length of a viral community before being able to estimate other parameters, such as viral richness. Although this approach has been widely used, it can adversely skew results since the majority of viruses are yet to be catalogued in databases. Results In this paper, we present ENVirT, a method for estimating the richness of novel viral mixtures, and for the first time we also show that it is possible to simultaneously estimate the average genome length without a priori information. This is shown to be a significant improvement over database-dependent methods, since we can now robustly analyze samples that may include novel viral types under-represented in current databases. We demonstrate that the viral richness estimates produced by ENVirT are several orders of magnitude higher in accuracy than the estimates produced by existing methods named PHACCS and CatchAll when benchmarked against simulated data. We repeated the analysis of 20 metavirome samples using ENVirT, which produced results in close agreement with complementary in virto analyses. Conclusions These insights were previously not captured by existing computational methods. As such, ENVirT is shown to be an essential tool for enhancing our understanding of novel viral populations.This work was supported partially by Australia Research Council [grant numbers LP140100670 and DP150103512] and the Biodiversity Research Center, Academia Sinica, Taiwan. DJ, DH, DS and YS were funded by the MIFRS and MIRS scholarships of The University of Melbourne. Publication costs were funded by The Australian National University

    Uncovering genetic heterogeneity in clinical and environmental viral metagenomes using next generation sequencing

    No full text
    © 2015 Dr. Demuni Duleepa Lasith JayasundaraThe focus of this thesis is understanding the genetic heterogeneity of clinically important viral populations that behave as quasispecies inside infected host organisms and environmental viral populations that exist in different ecosystems such as oceans and soil. This is achieved by developing novel bioinformatics tools to analyze next generation viral metagenomic data. The superiority of the newly developed algorithms in comparison to the existing state of the art methods is demonstrated using both simulated and real next generation sequencing data

    Real-time analysis of hospital length of stay in a mixed SARS-CoV-2 Omicron and Delta epidemic in New South Wales, Australia

    No full text
    Abstract Background The distribution of the duration that clinical cases of COVID-19 occupy hospital beds (the ‘length of stay’) is a key factor in determining how incident caseloads translate into health system burden. Robust estimation of length of stay in real-time requires the use of survival methods that can account for right-censoring induced by yet unobserved events in patient progression (e.g. discharge, death). In this study, we estimate in real-time the length of stay distributions of hospitalised COVID-19 cases in New South Wales, Australia, comparing estimates between a period where Delta was the dominant variant and a subsequent period where Omicron was dominant. Methods Using data on the hospital stays of 19,574 individuals who tested positive to COVID-19 prior to admission, we performed a competing-risk survival analysis of COVID-19 clinical progression. Results During the mixed Omicron-Delta epidemic, we found that the mean length of stay for individuals who were discharged directly from ward without an ICU stay was, for age groups 0–39, 40–69 and 70 +, respectively, 2.16 (95% CI: 2.12–2.21), 3.93 (95% CI: 3.78–4.07) and 7.61 days (95% CI: 7.31–8.01), compared to 3.60 (95% CI: 3.48–3.81), 5.78 (95% CI: 5.59–5.99) and 12.31 days (95% CI: 11.75–12.95) across the preceding Delta epidemic (1 July 2021–15 December 2021). We also considered data on the stays of individuals within the Hunter New England Local Health District, where it was reported that Omicron was the only circulating variant, and found mean ward-to-discharge length of stays of 2.05 (95% CI: 1.80–2.30), 2.92 (95% CI: 2.50–3.67) and 6.02 days (95% CI: 4.91–7.01) for the same age groups. Conclusions Hospital length of stay was substantially reduced across all clinical pathways during a mixed Omicron-Delta epidemic compared to a prior Delta epidemic, contributing to a lessened health system burden despite a greatly increased infection burden. Our results demonstrate the utility of survival analysis in producing real-time estimates of hospital length of stay for assisting in situational assessment and planning of the COVID-19 response
    corecore